Skip to main content

File Paths and Managing Files

File Paths

Understanding File Paths

File paths are specific locations of files on a computer or web server. They are crucial in programming for accessing, modifying, and organizing files within applications.

Types of File Paths

Relative File Paths

  • Used to read and write files using the file name alone.
  • Default to the directory where the Python script is executed.
  • Preferred for their flexibility across different systems.

Absolute File Paths

  • Specify the exact location of a file, including the drive name, directory, and file name.
  • Vary between operating systems:
    • Windows: C:/my-directory/target-file.txt
    • Mac/Linux: /users/username/my-directory/target-file.txt
  • Generally avoided due to lack of portability.

Using File Paths in Python

  • Cross-Platform Compatibility: Use the os.path module to handle differences between operating systems.
  • Environment Variables: File paths can also reference environment variables, libraries, and other resources.

How to Write File Paths in Code

File Paths and Operating Systems

  • Windows:
    • Uses drive letters and backslashes: C:\my-directory\target-file.txt
    • Backslashes are special characters in Python and need to be escaped.
  • Mac/Linux:
    • Use forward slashes and start from the root directory: /users/username/my-directory/target-file.txt

Best Practices

  • Use Forward Slashes: Even on Windows, using forward slashes (/) avoids issues with escape characters.
    • Example: C:/my-directory/target-file.txt
  • Avoid Absolute Paths: Use relative paths or dynamically construct paths for portability.

The os Module in Python

  • Accessing the Current Working Directory:
    import os
    current_directory = os.getcwd()
  • Constructing File Paths:
    file_path = os.path.join(current_directory, 'target-file.txt')
  • Listing Files and Directories:
    contents = os.listdir(current_directory)

Examples

Deleting a File

import os

# Delete a file
os.remove('obsolete-file.txt')

Renaming a File

import os

# Rename a file
os.rename('old-name.txt', 'new-name.txt')

Working with Files

File Operations with the os Module

  • Deleting Files: os.remove('filename')
  • Renaming Files: os.rename('old_name', 'new_name')
  • Moving Files: Use shutil.move('source', 'destination') from the shutil module.

Checking File Existence

  • Using os.path.exists():
    import os

    if os.path.exists('important-file.txt'):
    print('File exists.')
    else:
    print('File does not exist.')

More File Information

Getting File Metadata

  • File Size:
    import os

    size = os.path.getsize('example.txt')
    print(f'File size: {size} bytes')
  • Last Modification Time:
    import os
    import datetime

    timestamp = os.path.getmtime('example.txt')
    modification_time = datetime.datetime.fromtimestamp(timestamp)
    print(f'Last modified: {modification_time}')

Working with Timestamps

  • Unix Timestamps: Represent the number of seconds since January 1, 1970.
  • Converting Timestamps:
    import datetime

    timestamp = 1609459200 # Example timestamp
    readable_time = datetime.datetime.fromtimestamp(timestamp)
    print(readable_time) # Outputs: 2021-01-01 00:00:00

Absolute Paths

  • Getting Absolute Paths:
    import os

    absolute_path = os.path.abspath('relative/path/to/file.txt')
    print(absolute_path)

Directories

Working with Directories

Getting the Current Working Directory

import os

current_directory = os.getcwd()
print(f'Current directory: {current_directory}')

Creating Directories

import os

# Create a new directory
os.mkdir('new_directory')

Changing Directories

import os

# Change to a different directory
os.chdir('new_directory')

Removing Directories

  • Remove Empty Directory:
    import os

    os.rmdir('obsolete_directory')
  • Remove Non-Empty Directory:
    import shutil

    shutil.rmtree('obsolete_directory')

Listing Directory Contents

import os

# List files and directories
contents = os.listdir('.')
for item in contents:
print(item)
  • Differentiating Files and Directories:
    import os

    for item in os.listdir('.'):
    if os.path.isdir(item):
    print(f'{item}/')
    else:
    print(item)

Constructing File Paths

  • Using os.path.join():
    import os

    path = os.path.join('folder', 'subfolder', 'file.txt')
    print(path) # Outputs: folder/subfolder/file.txt

Working with CSV Files Using Pandas

What is a CSV File?

  • Definition: A Comma Separated Values (CSV) file is a plain text file that uses commas to separate values.
  • Usage: Commonly used for importing and exporting data for spreadsheets and databases.
  • Structure:
    Name,Department,Salary
    Aisha Khan,Engineering,80000
    Jules Lee,Marketing,67000
    Queenie Corbit,Human Resources,90000

Introduction to Pandas

  • Pandas: An open-source Python library providing high-performance data manipulation and analysis tools.
  • Advantages over csv Module:
    • Simplifies reading and writing data.
    • Handles complex data operations.
    • Provides DataFrame objects for easy data manipulation.

Reading CSV Files with Pandas

Importing Pandas

import pandas as pd

Reading a CSV File

# Read the CSV file into a DataFrame
df = pd.read_csv('employees.csv')

# Display the DataFrame
print(df)

Output:

            Name         Department  Salary
0 Aisha Khan Engineering 80000
1 Jules Lee Marketing 67000
2 Queenie Corbit Human Resources 90000

Accessing Data

  • Accessing Columns:
    # Get the 'Name' column
    names = df['Name']
  • Iterating Over Rows:
    for index, row in df.iterrows():
    print(f"{row['Name']} works in {row['Department']}")

Writing CSV Files with Pandas

Creating a DataFrame

import pandas as pd

# Define data as a dictionary
data = {
'Name': ['Carlos Rodriguez', 'Li Wei', 'Fatima Zahra'],
'Department': ['IT', 'Finance', 'Marketing'],
'Salary': [75000, 82000, 73000]
}

# Create a DataFrame
df = pd.DataFrame(data)

Writing to a CSV File

# Write the DataFrame to a CSV file
df.to_csv('new_employees.csv', index=False)

Resulting new_employees.csv:

Name,Department,Salary
Carlos Rodriguez,IT,75000
Li Wei,Finance,82000
Fatima Zahra,Marketing,73000

Reading and Writing CSV Files with Specific Options

Handling Missing Data

# Read CSV while handling missing values
df = pd.read_csv('employees.csv', na_values=['Not Available', 'NA'])

Specifying Delimiters

# Read a CSV file with semicolon delimiter
df = pd.read_csv('employees.csv', delimiter=';')

Writing without Index

  • Exclude Index Column:
    df.to_csv('employees.csv', index=False)

Data Manipulation with Pandas

Filtering Data

# Filter employees with salary greater than 80000
high_earners = df[df['Salary'] > 80000]
print(high_earners)

Adding New Columns

# Add a new column for bonus
df['Bonus'] = df['Salary'] * 0.10

Modifying Data

# Increase salary by 5%
df['Salary'] = df['Salary'] * 1.05

Advantages of Using Pandas

  • Powerful Data Structures: DataFrames and Series.
  • Easy Data Cleaning: Handling missing data and duplicates.
  • Data Analysis Tools: Statistical functions and aggregation.
  • Integration with Other Libraries: Works well with NumPy and Matplotlib.

Practical Example: Processing CSV Data with Pandas

Scenario

You have a CSV file inventory.csv containing inventory data:

Item,Quantity,Price
Laptop,20,1500
Mouse,150,20
Keyboard,85,45
Monitor,40,300

Reading the CSV File

import pandas as pd

# Read the CSV file
inventory = pd.read_csv('inventory.csv')

Calculating Total Inventory Value

# Add a new column for total value per item
inventory['TotalValue'] = inventory['Quantity'] * inventory['Price']

# Calculate the total inventory value
total_inventory_value = inventory['TotalValue'].sum()
print(f'Total Inventory Value: ${total_inventory_value}')

Output:

Total Inventory Value: $60550

Saving the Updated Inventory to a New CSV File

# Save the updated inventory to a new CSV file
inventory.to_csv('updated_inventory.csv', index=False)

Conclusion

Using Pandas for CSV operations provides a robust and efficient way to handle data. It simplifies the process of reading, writing, and manipulating CSV files, making data analysis tasks more straightforward.


Resources for Further Learning: